Skip to content

Conversation

@tombburnell
Copy link
Contributor

@tombburnell tombburnell commented Feb 3, 2017

This PR is help cope with major issue where duplicates get processed multiple times. It should ignore any files when there are more than 1 with the same signature

  • I've avoided opening the file twice (one for sig, once for processing)
  • I’ve make it cope with case where more than 2 files with same sig
  • Ive added METRICs for errors, duplicates and truncations so we can monitor in splunk
  • and added a debug flag

sender Outdated
f.close();

finally:
f.close();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

file might be closed twice.. possibly resulting in an exception

sender Outdated
log("Exception: %s" % e)
log("METRIC ns=forwarder.error.preprocess file=\"%s\" exception=\"%s\"" % (fn, str(e).replace("\n", "")))
log("Exception=\"%s\"" % e)
f.close();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

file might not be open at this point. to guarantee it, put the f = open before try:

sender Outdated
log("File no longer exists: %s" % fn)
continue
if os.path.getsize(fn) < self.signatureLength and not self.isCompressed(fn):
log("Skipping as sig too short or compressed: %s" % fn)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't the message here 'skipping as file to short and not compressed'?

sender Outdated
if should_stop():
return
for line in f:
debug("line %s" % line.replace("\n",""))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

calls like this in production will first resolve the string template, only to find out the debug is OFF... so the ideal, for performance in production, is to:
if debug_on: debug(...)

@@ -0,0 +1,217 @@
#!/usr/bin/env awk -f
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this file part of this PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants